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Abstract — In this work we examine a face recognition system 
based on advanced correlation filters. A thorough 
theoretical design analysis of, Minimum Average 
Correlation Energy (MACE) filters and Optimum Trade-off 
Synthetic Discriminant Function (OTSDF) also commonly 
known as Optimal Trade-off Filter (OTF) is provided. In 
practice one of the major computational aspects in the 
correlation filter design is representation of complex 
floating point Discrete Fourier Transform (DFT) 
coefficients using limited precision memory. In order to over 
come the floating point memory requirement of the 
correlation based filters for systems with limited 
computational resources use of Discrete Cosine Transform 
Sign-Only Correlation (DCTSOC) which deals with only the 
sign information of the Discrete Cosine Transform (DCT) 
has been proposed. The proposed method is tested for 
synthesis of OTF and a comparison of recognition rate for 
frontal face identification is made between OTF using 
DSOC (OTDS) and standard OTF 

Index Terms — correlation filters, discrete cosine transform, 
face recognition, synthetic discriminant function 

I. Introduction 

Correlation filters have been applied successfully for 
variety of applications, such as object tracking in real 
time[l], automatic target recognition [2] and recognition 
of biometrics e.g., face, finger print and iris [3]. It is well 
known that matched spatial filters are optimal in terms of 
maximum output signal to noise ratio for the detection of 
known image in the presence of noise under the 
assumption of Gaussian statistics [1]. One of the most 
well known of such composite linear correlation filters is 
the OTF's [4], which have been proposed as to improve 
the generalizing properties of the MACE filters [5]. In 
this paper we propose a new version of OTF based on 
DCT sign only correlation termed as OTDS which aims 
to simplify the image representation and correlation 
aspects of the OTF filter design. In section 2, a thorough 
theoretical design analysis of, MACE filter and OTF is 
provided. In section 3 and 4 we formulate DCT sign only 
correlation as an alternate to DFT based correlation 
approach used in the design of standard OTF. In section 
6, we present simulation results of frontal face 
recognition systems designed using OTDS approach and 
standard OTF approach. 

II. Correlation Filters Background 

Since the first proposal of Synthetic Discriminant 
Function (SDF) [6] there has been a great deal of interest 
in the concept of SDF and its variants. 



For a given set of vectors X = I Xj-,...,x M I the 
correlation function between a vector x and a filter h is 
defined as 

D 

a i (t) = x i *h = ^(x ij )h j+t (1) 

In SDF the values of the different correlation function 
at the origin is constrained to some preset value, i.e. 

D 

v,. = a,- (0) = X Xyhj = xf h i = 1, .... M (2) 

7=1 

Equation 2 can be represented in matrix form as, 



X'h 



(3) 



where, v is the desired output response of the filter h . In 
practice it turns out that for cases where m < d the 
desired system has infinitely many solutions. If a unique 
solution in column space X is chosen to be h = X,9 for a 
vector 9 , the filter response can be expressed as 



h = X(X X) V 



(4) 



where v. 's are generally chosen to be one (output 

constrain), the aim of the constraints is to control the 
variation in the output due to changes in rotation and 
scale. Under the assumption that columns of x are 
independent, the SDF approach as given in equation 4 
attempts to design a filter robust against scale and 
rotation by constraining the output to a specified value. 
The issue of invariance to distortion or noise is addressed 
by a variant of SDF called as Minimum Variance 
Synthetic Discrimenant Function (MVSDF). In MVSDF 
the correlation filter is designed assuming the presence of 
additive stationary noise denoted by n i.e. 



: x i +n 



(5) 



where x ; represents the noiseless image, under the 

assumption that the noise process is characterized by zero 
mean and covariance matrix C . The output of the filter at 
the origin can be written as, 



<r,-(0) = 



(6) 
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The variance in the output denoted by o-(O) is given 

by p = h Ch . Minimizing this output variance reduces 

the impact of n . The solution of h which minimizes p 
under the constrain given by equation 3 is obtained using 

the concept of Lagrange multipliers 

/(h, X) = h T Ch - 1 T (X T h - v) = 

D D D M f D \ 

2 Z Z c,M + X c v (V 2 - X M -v, - X *A ( ? ) 



Since c is symmetric 

S f D M 

~T = 2Z C in h t - X X iXin = 2Ch - XI = (8) 

oh n j,i ,., 

Equation 8 represents a system of D equations, the 
solution for which is obtained as, 



h=C _1 X -Ji 



(9) 



WhereX = [/l 1 A 2 ... A M \, substituting equation 
9 in equation 3 we get 



X T h = X T C"'X \ -X |=r. 



(xV'x) v-j-a. 



(10) 



Using equation 9 and equation 10 we get the solution 
for MVSDF as 



h = C"'x(x T C" 1 x)" 



(11) 



One of the major drawbacks with classical SDF and 
MVSDF is that they only constraint the output at a single 
point at the origin of the correlation plane without 
maximizing it, MVSDF in addition introduces the matrix 
C of size DxD which may be difficult to estimate and 
can increase the computational burden. The MACE filter 
aims to minimize the correlation energy over all the 
images in the training set which results suppression of all 
values in the correlation plane except at the origin, 
making the origin attain the maximum value in the 
correlation plane. 

Making use of the frequency domain properties of 
correlation, the correlation between the input image 

x . and the filter h in frequency domain is given as, 



F( X! .) - F(h) = A'ag(F(x,.))*F(h) 



(12) 



Where a* represents the complex conjugate of a , 
diagiiLf) represents diagonal matrix with the element of 



x ; along the diagonal and F(x ; ) represents the 

Discrete Fourier Transform of x. . Thus the energy in 
Fourier domain can be expressed as, 

E t =F(h)"^ a g(F(x ( .))A-flg((F(x,.)) W F(h) (13) 

where A represents the hermitian of A . Minimizing 
the average energy by summing E 's over i and dividing 
by M the average correlation energy can be written as 
£ =F(h)"BF(h), where 



B 



( 1 M N 

■ diag — X(F(x ( .)'F(x.)) I (14) 



By analyzing Fourier transform in terms of matrix 
multiplication we know that F = (1 / D)F hence 
equation 1 can be expressed in Fourier domain as 



F(X) F(h) = Dv 



(15) 



Applying the same analysis as used for MVSDF in 
Fourier domain we have the final equation for MACE 
filter given as 



F(h) = B"'F(X)(f(X) // B"'F(X))"' Dv 



(16) 



Although the MACE filter does ensure maximum at 
the origin, its ability not to generalize for image 
characteristics unseen during training makes it 
particularly successful for verification and registration. 
One of the drawbacks of MACE filter is that, in order to 
produce very sharp peaks in the correlation plane for 
authentic images the MACE filter emphasizes on high 
spatial frequencies which makes MACE filters 
susceptible to noise. 

Optimal Trade-off Filter which is a variant of the 
MACE filter minimizes the average correlation energy 
with the addition of an appropriate noise factor, typically 
modeled as Gaussian with diagonal covariance matrix 
C .Under the assumption that the effect of the noise on 
the Fourier transform of the image is independent and 
uncorrected with the images in the training set, the 
variation of the noise in the correlation output can be 

shown to be given by F(h) CF(h) [7], where C is a 
diagonal matrix of dimension DxD containing input 
noise power spectral density values as its diagonal 
elements. When input noise is modeled as white, i.e. 

C = I minimizing output variance F(h) CF(h) results 
in a filter that emphasizes low spatial frequencies, where 
as minimizing the average correlation energy 

F(h) BF(h) leads to a filter that emphasizes high spatial 

frequencies, hence an optimal trade off between 
minimizing output variance and minimizing average 
correlation energy results in OTF [8] which minimizes 

F(h) TF(h) subject to the constraints in equation 3, 
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here T = lafB + Vl-a 2 C) , where B is the same matrix as 

used in MACE filter and a is some constant which is 
determined by the constraints of the problem. The value 
of a determines the relative importance of noise a = ^ 
leads to maximum noise tolerance filter and a = 1 leads 
to MACE filter. Using the same argument used to solve 
the MACE filter given by equation 16, the solution for 
OTF can be obtained as 



F orF (h) = T- 1 F(X)(F(X) H T-'F(X))"Dv 



(17) 



III. DCT SIGN ONLY REPRESENTATION AND DCT SIGN 
ONLY CORRELATION (DCTSOC) 

DCT is a key technology for coded images can be 
considered as a special case of DFT where in the phase 
component is zero and the structural information present 
in the phase part of the DFT is contained in the sign of 
DCT coefficients. Similar to the concept of Fourier phase 
only correlation, the idea behind DCT signs only 
correlation is to use the important information about the 
features and details in an image at reduced representation 
cost [9]. Also, the sign information of the DCT 
coefficients (called DCT signs) is robust against scalar 
quantization noise because positive signs do not change 
to negative signs and vice versa. Moreover, the concise 
expression of DCT sings saves physical space to calculate 
and store them. Because of these DCT properties, target 
image search and retrieval taking advantage of the DCT 
signs in coded image has been studied [10, 11]. It should 
be noted that the intelligibility of the sign-only 
representation depends on the magnitude "smoothness" 
of the signal being looked at. Since most natural images 
contain mostly low frequency content, their magnitude 
rolls off quickly at high frequency and this leads to the 
situation where the "high pass" interpretation of the 
phase-only transform holds. Hence transformation into a 
phase-only or sign only image can also be approximately 
interpreted as a high pass filtering operation. 

Unlike DFT which transforms real input into complex 
coefficients, DCT transforms real inputs in to real 
coefficients. There are four different types of DCT, type 
II is the one which is commonly used for image coding. 
The 2D version of DCT type II transform and its inverse 
is given as 



¥ c {x{k l ,k 2 ))= — W{k l )W(k 1 )Y^,An l ,n 1 )- 



( 2«j + 1) k\7i I [ ( 2«2 + 1) k 2 n 
-cos 

\ 2N X ) \ 2N 2 



(18) 



x{n l ,n 2 )=W c (x(k v k 2 ))=-W(n l )W(n 2 ) 

N 



K *2 



(2k x +l)n 1 ;r| [ (2^ 2 +l)«27r 



2K 



2M, 



(19) 



0<k 2 <N 2 . 



Where, x(n , , « 2 ) is the 2D matrix with < n x < N x 
& 0<n 2 <N 2 and F c yx(k l ,k 2 )) represents the 2D 
DCT with 0<k { <N 1 , and 
Normalization coefficient W(k) is given as 

[1/V2 (n = 0) 
[ 1 (n * 0) 



W(n) ■■ 



(20) 



The signs of F c ( x( k j , k 2 ) J are computed as 

-1 (zdJXO) 

sgn(z(U))= (z(/,j) = 0) (21) 

+ 1 (z(i,j)>0) 

For given image x and a template g with their 
respective DCT coefficients F c (x) , F c (g) normalized 
correlation surface can be expressed as, 

F c (x)-F c (g) 



R c = 



F c (x)-F c (g)| 



(22) 



But since DCT coefficients are real 
numbers x/ \x\ = sgn(x) , so the above equation can be 
written as, 

R c =r^-r^i = s g n ( F c( x ))- s g n ( F c(g)) 

|F C W| l F c(g)| (23) 

= F, C W-F sc (g) 

The correlation surface/plane of x, g in spatial domain 
is obtained by computing the inverse 2D-DCT of R c ie. 

if c (r c ). 

IV. Face recognition with OTF using DCTSOC 
(OTDS) 

For better performance and speed of the real-time 
systems with hardware constrains where correlation 
filters are used for biometrics, one of the biggest 
bottlenecks in the system's real time performance is the 
computationally intensive evaluation of the 2-D DFT of 
images not only for the filter design but also for fast 
computing of correlations during testing phase [12]. Fast 
algorithms based on two bit quantization of DFT 
coefficients to represent phase information of 2D images 
termed as Quad Phase MACE (QP-MACE) filters have 
been suggested [13]. 

In general, the MACE filters and OTF make use of 
DFT for correlation as explained in section 2. We 
propose to use correlation using DCT sign only values. 
Using the properties of DCT as explained in section 3 the 
design equation for standard OTF using DCT sign only 
representation (OTDS) can be written as 
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Figure 1. Correlation outputs of OTDS filter designed using images 
of the first person from AMP expression database, when tested for 
(A) a sample test image of the first subject (B) a sample test image of 
second subject 



F oros (h) = sgn(T- 1 F ic (X)(F ic (X) r T-'F JC (X))" 1 Dv)(24) 
T = I aB + VI - « 2 C 1 ,where 



B = diag — Z(F, c (x,)F sc ( Xl )) (25) 

\M ,.„ ' / 

Figl shows the correlation outputs of OTDS filter 
designed using all the images of the first person from 
AMP expression database, when tested for a sample test 
image of the first subject and a sample test image of 
second subject. 

One of the most commonly used similarity measure in 
correlation filters is based on the value of the largest 
correlation peak as the match score metric. However, to 
minimize sensitive of the match-score metric to variations 
such as illumination change and shift. A metric termed as 
peak-to-sidelobe ratio (PSR) which measures the peak 
sharpness of the resulting correlation plane is used. The 
PSR is computed by masking out a 5X5 rectangular 
region centered at the peak and the defining the 
remaining annular region as the sidelobe region. The PSR 
metric defined as, 



PSR 



peak — mean 



(26) 



<T 



where peak indicates the maximum correlation output, 
the mean and standard deviation (a) comes from the 
side-lobe region surrounding the peak region. 

A. Proposed Approach 

For a given set of N training images per subject the 
design procedure for the proposed algorithm can be 
generalized as given below 

Input : 

Read the training set of N images for each of the M 
subjects 

X* ={x.,...,X,.| , k = \,...,M 

t " ' •) \<i<N 

Algorithm Training : 

1. Obtain the bipolar representation from DCT Sign 



only representation 

F JC (X*) = sgn(F c (X*)) 
2. Design the OTDS filter using DCT sign only values: 
Votds (V ) = sgn (t-'F ic (X* )(F SC (X* f T~% c (X k ))"' Dv 



3. Build a filter bank consisting of one filter per 
subject: 

¥ OTDS ( H ) = ( F OTOS ( h ' )••-. ¥ OTDS ( h " )) 



Algorithm Testing : 

1 . Read the testing image to be verified: y 

2. Obtain DCT Sign only representation: 

F sc (y) = sgn(F c (y)) 

3. Cross correlate the DCT sign only representation 
with each of the filters in the designed filter bank and 
compute the PSR from the correlation output of each of 
the filters in the filter bank. 

PSR(k) = F 0TDS (H)-F sc (y) 

4. Identify the test image as the subject with max PSR. 

V. Evaluation method 

In order to have a measure of representation of the 
training dataset by the code book vectors obtained from 
the algorithm the following five tests are conducted. 

A. Method I (In-Database test): 

To test the learning ability, we use all the images 
present in a given database as the training set. This gives 
a measure of learning ability of the algorithm for a given 
database in terms of identification rate when all images 
present in the training set are used for testing. 

B. Method II (Out-Database test): 

To test the ability of the algorithm to identify known 
faces with unknown variations i.e. expression unseen 
during training, we use the hold out method/leave one out 
method by excluding all images with identical 
expression from the training set and use this hold out 
images/expression from each of the individual as the test 
set. Analysis of the identification rate of this test reveals 
performance of the algorithm to identify subjects for a 
particular expression unseen during training 

C. Method III (3 Fold Best): 

In this model of evaluation, approximately one third of 
the total expressions in a given data base with best 
identification rate during leave one out method (Method 
II) are considered for training. And the test set consisted 
of all the images belonging to the rest of the expressions. 
Note that the set up of our 3 fold method is different from 
the normally used 3 fold method, where two third of 
entire dataset is used for training and one third of the 
dataset is used for testing. 
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D. Method IV (3 Fold Least): 

This model of evaluation is similar to that of method 
III but instead of using expressions with best 
identification rate for training we use the expressions 
associated with least identification rate during the hold- 
out/out-database test method as the training set. 

E. Method V (3 Fold Interim): 

In this 3 fold mode of evaluation, the training set 
consists of one third the total number of expressions, with 
one third of expressions in the training set are drawn from 
the training set of Method III and Method IV and the rest 
of the expressions where those which exhibited 
intermediate identification rate during the leave one out 
test. 

Based on the five evaluation methods as discussed 
above we compare the results of standard OTF approach 
with the proposed OTDS approach on four standard 

TABLE I. 

Average Recognition Rate (%) of standard OTF and OTF- 
DSOC, using each of the five evaluation methods on 
Extended Yale-B face database. 

Evaluation methods 

Method Method Method Method Method 
I II III IV V 



Out-Database Identification for Extended VALE-B Database 



Standard 
OTF 

OTDS 



89.0 86.3 



Sh. 1 



74.9 55.7 



78.4 55.7 



84.1 



83.4 



frontal face datasets of different size and varied 
constraints imposed on the acquired face images. For 
computational efficiency the images in all the datasets 
were down sampled to size nearly 32x32 pixels by 
maintaining their aspect ratio. 



VI. Simulation Results 

A. Extended Yale-B face database 

The extended Yale Face Database B contains 16128 
images of 37 human subjects under 64 illumination 
conditions. Figure. 3 shows expressions/images of one of 
the subjects. 

To verify the performance of the proposed method to 
extreme lighting variations, we have conducted 
experiments on cropped version of extended Yale-B 
database scaled to a size of 32x32 pixels. Table 1 gives a 
comparative list of experimental results obtained using 
the five evaluation methods discussed in section 3.2. 
Figure 2 gives details of identification rate of each of the 
expression using the holdout-expression/out-database 
method. 

For the evaluation methods as given in section 5.1 the 
identification rate obtained for Extended Yale-B dataset 
using OTDS approach were almost identical to the 
standard OTF approach, However it should be noted that 
in case of evaluation method III when the filters were 
designed with better expressions/images (i.e. expressions 

© 2010 ACEEE 
DOI: 01.ijsip.01.01.01 



100 

<■> bo" 

I 

e 

g 60 

CO 

u 

us 

1 10 

20 





Figure 2. Out-data 
Extended Yl 
approach. 






TABLE II. 

Average Recognition Rate (%) of standard OTF and OTF- 

DSOC, USING EACH OF THE FIVE EVALUATION METHODS ON AR 
FACE DATABASE. 









Evaluation methods 






Method 


Method 


Method 


Method 


Method 




I 


II 


III 


IV 


V 


Standard 
OTF 


98.6 


82.0 


68.8 


64.4 


79.1 


i 

, OTDS 


95.5 


83.9 


79.0 


64.4 


85.6 



T ! rt** 




V V 



g i | 

r m 




Figure 3. Images of the first subject from Extended Yale-B face 
database arranged sequentially (top left to bottom right) in terms of 
the expression index 1-64. 

with high identification rate during the hold out method) 
an improvement of nearly 4% is observed for OTDS 
approach over standard OTF approach. 

B. Yale face database 

The Yale database from Yale centre for Computational 
Vision and Control contains 165 frontal face images 
covering 15 individuals taken under 1 1 different 
conditions; a normal image under ambient lighting, one 
with or without glasses, three images taken with different 
point light sources, and five different facial expressions. 
Figure 4 illustrates images from a sample subject from 
this database 

A comparison of identification rates between proposed 
OTDS approach and standard OTF approach for Yale 
dataset is listed in Table 2 and Figure 5. The 
improvement in performance of the OTDS approach for 
evaluation method II-V indicate that when the constrain 
on representation of frontal face by a mug-shot/cropped is 
relaxed and when one allows for variations such as hair 
style and head gear to be present in the training/test set 
the proposed OTDS approach is able to provide a more 
generalized filter representation. 

m i 
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9 9 8 

Figure 4. Images of the first subject from Yale face database arranged 
sequentially (top left to bottom right) in terms of the expression index 1- 
1 1 (Normal, Glasses, Happy, Left Light, Center Light, Surprised, Right 
Light, Sad, Sleepy, yawn, wink) 

Out-Database Identification for YALE Database 




i 40 
1 

20 



i OTDS (avg):S4.2427 
'■OTF (ava):80.6064 



4 S 

Expression 



10 



Figure 5. Out-data identification rate of each of the 11 expressions of 
Yale database obtained using standard OTF and OTDS approach. 

C. AR face database 



TABLE III. 
Average Recognition Rate (%) of standard OTF and OTF- 

DSOC, USING EACH OF THE FIVE EVALUATION METHODS ON YALE 
FACE DATABASE. 

Evaluation methods 



Method Method Method 
I II III 



Method Method 
IV V 



Standard 



99.3 



OTF 

OTDS 93.3 



80.6 75.1 



84.2 



31.8 



49.0 



72.1 



80.0 



89.6 



? 



c> 



8 f 9 



&fi & 



Figure 6. Images of the first subject from AR face database arranged 
sequentially (top left to bottom right) in terms of the expression index 1 
-13 (Neutral expression, Smile, Anger, Scream, Right light on, Left 
light on, All sides lights on, Wearing sun glasses, Wearing sun glasses 
and right light on, Wearing sun glasses and left light on, Wearing sun 
glasses and all light on Wearing scarf, Wearing scarf and left light on, 
Wearing scarf and right light on). 



Out-Database Identification for AR Database 
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Figure 7. Out-data identification rate of each of the 13 expressions of 
AR face database obtained using standard OTF and OTDS approach 

The AR face database created by Alex Martinez and 
Robert Benavente consist of 126 subjects; 70 male and 56 
female with 13 images per person with different 
variations in expression, illumination conditions and 
occlusions. Figure 6 depicts sample image from one of 
the subjects from this dataset. 

To further demonstrate the performance of the 
proposed approach experiments are conducted using AR 
face database. The results illustrated in Table 3 and 
Figure 7 confirm the robustness of the OTDS approach 
over the standard OTF approach for datasets with 
variations such as expression and head gear. 

D. AMP expression face database 

This dataset has 13 subjects each subject being 
represented with 75 images showing different 
expressions. These face images are collected in the same 
lighting condition using CCD camera and have been 
well-registered by their eyes location. Figure 8 shows 
some expression images of one subject 

Experimental results as tabulated in Table 4 indicate 
that both approaches performed extremely well on the 
AMP expression dataset and there is nothing much to 
choose from as far as this dataset is considered 



,1 ^,L J.<lT^L 




Figure 8. Sample images of the first subject from AMP expression face 
database 
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TABLE IV. 

Average Recognition Rate (%) of standard OTF and OTF- 
dsoc, using each of the five evaluation methods on amp 
Expression face database. 

Evaluation methods 

Method Method Method Method Method 
I II III IV V 



Standard 



99.6 99.6 



95.7 



OTF 

OTDS 99.3 99.3 96.6 



97.1 



97.1 



97.1 



97.0 



Out-Database Identification for AMP Expression Database 
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Figure 8. Out-data identification rate of each of the 75 expressions of 
AMP Expression face database obtained using standard OTF and OTF- 
DSOC approach 



Conclusions 

In this paper, we have proposed and evaluated a DCT 
based version of OTF filter using DCT sign only 
correlation which is efficient both in terms of 
computation and storage as it deals with bipolar 
representation of images and filter coefficients in DCT 
domain making computation of correlation easier/faster. 
Based on comparison of the identification rate of the 
proposed OTDS approach with the standard OTF 
approach using four standard frontal face database 
conclusions can be drawn that: 

The performance of OTDS approach is comparable to 
standard OTF approach for datasets consisting of mug- 
shot frontal faces with only illumination and expression 
variations. OTDS approach is as tolerant to illumination 
variations as the standard OTF approach, demands less 
storage requirement and is computationally less complex 
during testing as it makes use of correlation between 
bipolar vectors as similarity measure for classification. 

For datasets such as AR and Yale, where many of the 
images were present with occlusions (scarf and black 
sunglasses) OTDS achieved better results than the 
standard OTF approach and is hence a better alternate to 
standard OTF approach for frontal face recognition . 
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